This set of exercises is about exploratory factor analysis. We shall use some basic features of psych
package. For quick introduction to exploratory factor analysis and psych
package, we recommend this short “how to” guide.
You can download the dataset here. The data is fictitious.
Answers to the exercises are available here.
If you have different solution, feel free to post it.
Exercise 1
Load the data, install the packages psych
and GPArotation
which we will use in the following exercises, and load it. Describe the data.
#################### # # # Exercise 1 # # # #################### install.packages(c("psych", "GPArotation")) library(psych) data <- read.file("efa.csv") describe(data)
## vars n mean sd median trimmed mad min max range skew kurtosis ## V1 1 649 4.79 0.47 5 4.89 0.00 3 5 2 -2.16 3.96 ## V2 2 649 4.77 0.53 5 4.89 0.00 2 5 3 -2.63 7.59 ## V3 3 649 4.62 0.72 5 4.79 0.00 1 5 4 -2.13 4.73 ## V4 4 649 4.84 0.45 5 4.96 0.00 2 5 3 -3.13 10.87 ## V5 5 649 4.85 0.44 5 4.97 0.00 2 5 3 -3.31 12.40 ## V6 6 649 4.83 0.48 5 4.95 0.00 2 5 3 -3.31 12.40 ## V7 7 649 4.71 0.61 5 4.85 0.00 2 5 3 -2.20 4.59 ## V8 8 649 4.70 0.62 5 4.85 0.00 2 5 3 -2.22 4.70 ## V9 9 649 4.50 0.85 5 4.69 0.00 1 5 4 -1.97 3.86 ## V10 10 649 4.69 0.72 5 4.86 0.00 1 5 4 -2.81 8.78 ## V11 11 649 4.61 0.89 5 4.86 0.00 1 5 4 -2.65 6.63 ## V12 12 649 4.39 1.12 5 4.67 0.00 1 5 4 -1.86 2.39 ## V13 13 649 4.68 0.62 5 4.80 0.00 1 5 4 -2.27 5.96 ## V14 14 649 4.37 0.80 5 4.50 0.00 1 5 4 -1.17 0.93 ## V15 15 649 4.61 0.62 5 4.71 0.00 2 5 3 -1.60 2.54 ## V16 16 649 4.71 0.60 5 4.85 0.00 1 5 4 -2.23 5.09 ## V17 17 649 4.71 0.62 5 4.85 0.00 1 5 4 -2.42 6.32 ## V18 18 649 4.72 0.60 5 4.85 0.00 1 5 4 -2.50 7.29 ## V19 19 649 4.56 0.73 5 4.72 0.00 1 5 4 -1.70 2.68 ## V20 20 649 4.42 0.90 5 4.60 0.00 1 5 4 -1.73 2.80 ## V21 21 649 3.30 0.96 3 3.29 1.48 1 5 4 -0.16 -1.03 ## V22 22 649 2.98 1.10 3 2.89 1.48 1 5 4 0.41 -0.98 ## V23 23 649 3.27 1.23 3 3.29 1.48 1 5 4 -0.10 -1.11 ## V24 24 649 2.69 1.01 3 2.68 1.48 1 5 4 0.33 -0.64 ## V25 25 649 2.85 1.07 3 2.85 1.48 1 5 4 0.14 -0.92 ## V26 26 649 3.69 0.83 4 3.75 0.00 1 5 4 -0.74 0.53 ## se ## V1 0.02 ## V2 0.02 ## V3 0.03 ## V4 0.02 ## V5 0.02 ## V6 0.02 ## V7 0.02 ## V8 0.02 ## V9 0.03 ## V10 0.03 ## V11 0.03 ## V12 0.04 ## V13 0.02 ## V14 0.03 ## V15 0.02 ## V16 0.02 ## V17 0.02 ## V18 0.02 ## V19 0.03 ## V20 0.04 ## V21 0.04 ## V22 0.04 ## V23 0.05 ## V24 0.04 ## V25 0.04 ## V26 0.03Exercise 2
Using the parallel analysis, determine the number of factors.
#################### # # # Exercise 2 # # # #################### fa.parallel(data)
## Parallel analysis suggests that the number of factors = 5 and the number of components = 3Exercise 3
Determine the number of factors using Very Simple Structure method.
#################### # # # Exercise 3 # # # #################### vss(data)
## ## Very Simple Structure ## Call: vss(x = data) ## VSS complexity 1 achieves a maximimum of 0.91 with 2 factors ## VSS complexity 2 achieves a maximimum of 0.93 with 3 factors ## ## The Velicer MAP achieves a minimum of 0.01 with 3 factors ## BIC achieves a minimum of -742.22 with 5 factors ## Sample Size adjusted BIC achieves a minimum of -160.17 with 8 factors ## ## Statistics by number of factors ## vss1 vss2 map dof chisq prob sqresid fit RMSEA BIC SABIC complex ## 1 0.88 0.00 0.014 299 2003 1.9e-249 14.6 0.88 0.095 67 1016 1.0 ## 2 0.91 0.91 0.014 274 1546 3.3e-176 10.5 0.91 0.086 -228 642 1.0 ## 3 0.62 0.93 0.013 250 1095 4.9e-106 9.0 0.93 0.073 -524 270 1.4 ## 4 0.56 0.86 0.015 227 783 2.9e-62 8.1 0.93 0.062 -687 34 1.7 ## 5 0.40 0.81 0.019 205 585 2.8e-38 7.3 0.94 0.054 -742 -91 1.9 ## 6 0.40 0.81 0.022 184 502 2.4e-31 6.4 0.95 0.053 -689 -105 2.0 ## 7 0.38 0.82 0.024 164 425 4.8e-25 5.7 0.95 0.050 -637 -116 2.1 ## 8 0.35 0.75 0.027 145 318 5.0e-15 5.3 0.96 0.044 -621 -160 2.3 ## eChisq SRMR eCRMS eBIC ## 1 2201 0.072 0.075 265 ## 2 1093 0.051 0.055 -682 ## 3 665 0.040 0.045 -953 ## 4 461 0.033 0.040 -1009 ## 5 320 0.028 0.035 -1007 ## 6 237 0.024 0.031 -955 ## 7 162 0.020 0.028 -900 ## 8 115 0.017 0.025 -824Exercise 4
Based on normality test, is the Maximum Likelihood factoring method proper, or is OLS/Minres better? (Tip: Maximum Likelihood method requires normal distribution.)
#################### # # # Exercise 4 # # # #################### sapply(data, shapiro.test)
## V1 V2 ## statistic 0.4880158 0.4818245 ## p.value 1.790874e-39 1.218162e-39 ## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" ## data.name "X[[i]]" "X[[i]]" ## V3 V4 ## statistic 0.5837358 0.4039763 ## p.value 1.193739e-36 1.287584e-41 ## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" ## data.name "X[[i]]" "X[[i]]" ## V5 V6 ## statistic 0.386337 0.4010764 ## p.value 4.917495e-42 1.097388e-41 ## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" ## data.name "X[[i]]" "X[[i]]" ## V7 V8 ## statistic 0.5352881 0.5343438 ## p.value 3.876222e-38 3.636428e-38 ## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" ## data.name "X[[i]]" "X[[i]]" ## V9 V10 ## statistic 0.631653 0.4956688 ## p.value 4.943564e-35 2.898898e-39 ## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" ## data.name "X[[i]]" "X[[i]]" ## V11 V12 ## statistic 0.4952979 0.6048789 ## p.value 2.83163e-39 5.900295e-36 ## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" ## data.name "X[[i]]" "X[[i]]" ## V13 V14 ## statistic 0.5630808 0.7476871 ## p.value 2.666746e-37 2.854301e-30 ## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" ## data.name "X[[i]]" "X[[i]]" ## V15 V16 ## statistic 0.6405669 0.5328645 ## p.value 1.031309e-34 3.290914e-38 ## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" ## data.name "X[[i]]" "X[[i]]" ## V17 V18 ## statistic 0.5253764 0.5207042 ## p.value 1.993172e-38 1.462449e-38 ## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" ## data.name "X[[i]]" "X[[i]]" ## V19 V20 ## statistic 0.6448004 0.6737761 ## p.value 1.469911e-34 1.828638e-33 ## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" ## data.name "X[[i]]" "X[[i]]" ## V21 V22 ## statistic 0.8599259 0.8613114 ## p.value 1.369292e-23 1.744914e-23 ## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" ## data.name "X[[i]]" "X[[i]]" ## V23 V24 ## statistic 0.8992078 0.8899322 ## p.value 3.080095e-20 4.157814e-21 ## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" ## data.name "X[[i]]" "X[[i]]" ## V25 V26 ## statistic 0.8948388 0.8292828 ## p.value 1.17986e-20 9.748275e-26 ## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test" ## data.name "X[[i]]" "X[[i]]"Exercise 5
Using oblimin rotation, 5 factors and factoring method from the previous exercise, find the factor solution. Print loadings table with cut off at 0.3.
#################### # # # Exercise 5 # # # #################### f.solution <- fa(data, nfactors=5, rotate="oblimin", fm="minres") print(f.solution$loadings, cutoff=0.3)
## ## Loadings: ## MR1 MR3 MR5 MR2 MR4 ## V1 0.662 ## V2 0.675 ## V3 0.740 ## V4 0.664 ## V5 0.460 ## V6 0.448 0.336 ## V7 0.722 ## V8 0.648 ## V9 0.652 ## V10 0.759 ## V11 0.546 ## V12 0.344 0.465 ## V13 0.483 ## V14 0.563 ## V15 0.566 ## V16 0.621 ## V17 0.676 ## V18 0.495 ## V19 0.866 ## V20 0.524 ## V21 0.316 ## V22 ## V23 0.820 ## V24 0.616 ## V25 ## V26 0.448 ## ## MR1 MR3 MR5 MR2 MR4 ## SS loadings 2.887 2.452 2.381 1.522 1.283 ## Proportion Var 0.111 0.094 0.092 0.059 0.049 ## Cumulative Var 0.111 0.205 0.297 0.355 0.405Exercise 6
Plot factors loadings.
#################### # # # Exercise 6 # # # #################### plot(f.solution, title="Factor loadings")
Plot structure diagram.
#################### # # # Exercise 7 # # # #################### fa.diagram(f.solution, main="Structural diagram")
Find the higher-order factor model with five factors plus general factor.
#################### # # # Exercise 8 # # # #################### omega(data, nfactors = 5, sl=FALSE, title="Higher-order factor solution")
## Higher-order factor solution ## Call: omega(m = data, nfactors = 5, title = "Higher-order factor solution", ## sl = FALSE) ## Alpha: 0.91 ## G.6: 0.94 ## Omega Hierarchical: 0.81 ## Omega H asymptotic: 0.86 ## Omega Total 0.94 ## ## Schmid Leiman Factor loadings greater than 0.2 ## g F1* F2* F3* F4* F5* h2 u2 p2 ## V1 0.62 0.45 0.58 0.42 0.65 ## V2 0.62 0.46 0.59 0.41 0.65 ## V3 0.61 0.50 0.63 0.37 0.58 ## V4 0.62 0.45 0.60 0.40 0.63 ## V5 0.62 0.31 0.50 0.50 0.77 ## V6 0.70 0.23 0.23 0.62 0.38 0.80 ## V7 0.72 0.37 0.65 0.35 0.79 ## V8 0.74 0.33 0.66 0.34 0.82 ## V9 0.76 0.33 0.69 0.31 0.83 ## V10 0.77 0.38 0.76 0.24 0.79 ## V11 0.59 0.28 0.22 0.46 0.54 0.74 ## V12 0.64 0.27 0.52 0.48 0.79 ## V13 0.75 0.28 0.69 0.31 0.82 ## V14 0.65 0.33 0.57 0.43 0.73 ## V15 0.62 0.33 0.50 0.50 0.76 ## V16 0.67 0.36 0.60 0.40 0.75 ## V17 0.68 0.39 0.63 0.37 0.72 ## V18 0.60 0.29 0.48 0.52 0.75 ## V19 0.57 0.69 0.79 0.21 0.41 ## V20 0.60 0.41 0.57 0.43 0.63 ## V21 0.32 0.15 0.85 0.01 ## V22 0.30 0.10 0.90 0.00 ## V23- -0.82 0.68 0.32 0.00 ## V24- -0.62 0.38 0.62 0.00 ## V25 0.23 0.08 0.92 0.00 ## V26- -0.45 0.20 0.80 0.00 ## ## With eigenvalues of: ## g F1* F2* F3* F4* F5* ## 8.68 0.81 1.12 0.74 1.52 0.80 ## ## general/max 5.71 max/min = 2.06 ## mean percent general = 0.55 with sd = 0.32 and cv of 0.58 ## Explained Common Variance of the general factor = 0.64 ## ## The degrees of freedom are 205 and the fit is 0.92 ## The number of observations was 649 with Chi Square = 585.24 with prob < 2.8e-38 ## The root mean square of the residuals is 0.03 ## The df corrected root mean square of the residuals is 0.03 ## RMSEA index = 0.054 and the 10 % confidence intervals are 0.048 0.059 ## BIC = -742.22 ## ## Compare this with the adequacy of just a general factor and no group factors ## The degrees of freedom for just the general factor are 299 and the fit is 3.3 ## The number of observations was 649 with Chi Square = 2103.59 with prob < 3.5e-268 ## The root mean square of the residuals is 0.09 ## The df corrected root mean square of the residuals is 0.09 ## ## RMSEA index = 0.097 and the 10 % confidence intervals are 0.093 0.1 ## BIC = 167.43 ## ## Measures of factor score adequacy ## g F1* F2* F3* F4* ## Correlation of scores with factors 0.92 0.68 0.77 0.64 0.88 ## Multiple R square of scores with factors 0.84 0.46 0.59 0.41 0.77 ## Minimum correlation of factor score estimates 0.67 -0.08 0.19 -0.19 0.55 ## F5* ## Correlation of scores with factors 0.82 ## Multiple R square of scores with factors 0.68 ## Minimum correlation of factor score estimates 0.35 ## ## Total, General and Subset omega for each subset ## g F1* F2* F3* F4* ## Omega total for total scores and subscales 0.94 0.88 0.87 0.88 0.21 ## Omega general for total scores and subscales 0.81 0.71 0.62 0.72 0.00 ## Omega group for total scores and subscales 0.07 0.17 0.25 0.16 0.21 ## F5* ## Omega total for total scores and subscales 0.78 ## Omega general for total scores and subscales 0.42 ## Omega group for total scores and subscales 0.37Exercise 9
Find the bifactor solution.
#################### # # # Exercise 9 # # # #################### omega(data, title="Bifactor solution")
## Bifactor solution ## Call: omega(m = data, title = "Bifactor solution") ## Alpha: 0.91 ## G.6: 0.94 ## Omega Hierarchical: 0.86 ## Omega H asymptotic: 0.92 ## Omega Total 0.94 ## ## Schmid Leiman Factor loadings greater than 0.2 ## g F1* F2* F3* h2 u2 p2 ## V1 0.59 0.44 0.54 0.46 0.64 ## V2 0.58 0.48 0.56 0.44 0.59 ## V3 0.54 0.56 0.63 0.37 0.47 ## V4 0.56 0.54 0.60 0.40 0.52 ## V5 0.61 0.37 0.50 0.50 0.73 ## V6 0.68 0.35 0.59 0.41 0.79 ## V7 0.72 0.21 0.56 0.44 0.91 ## V8 0.74 0.22 0.61 0.39 0.92 ## V9 0.77 0.64 0.36 0.94 ## V10 0.77 0.26 0.66 0.34 0.89 ## V11 0.61 0.38 0.62 0.97 ## V12 0.70 0.50 0.50 0.98 ## V13 0.83 0.70 0.30 1.00 ## V14 0.75 0.58 0.42 0.97 ## V15 0.69 0.48 0.52 0.99 ## V16 0.74 0.56 0.44 0.99 ## V17 0.71 0.53 0.47 0.97 ## V18 0.65 0.44 0.56 0.97 ## V19 0.63 0.41 0.59 0.98 ## V20 0.66 0.44 0.56 0.98 ## V21 0.34 0.13 0.87 0.02 ## V22 0.27 0.07 0.93 0.00 ## V23- -0.81 0.66 0.34 0.00 ## V24- -0.61 0.37 0.63 0.00 ## V25 0.23 0.06 0.94 0.00 ## V26- -0.44 0.19 0.81 0.00 ## ## With eigenvalues of: ## g F1* F2* F3* ## 9.28 0.03 1.55 1.53 ## ## general/max 5.98 max/min = 57.78 ## mean percent general = 0.66 with sd = 0.4 and cv of 0.6 ## Explained Common Variance of the general factor = 0.75 ## ## The degrees of freedom are 250 and the fit is 1.72 ## The number of observations was 649 with Chi Square = 1095.18 with prob < 4.9e-106 ## The root mean square of the residuals is 0.04 ## The df corrected root mean square of the residuals is 0.05 ## RMSEA index = 0.073 and the 10 % confidence intervals are 0.068 0.077 ## BIC = -523.68 ## ## Compare this with the adequacy of just a general factor and no group factors ## The degrees of freedom for just the general factor are 299 and the fit is 3.37 ## The number of observations was 649 with Chi Square = 2150.99 with prob < 4.8e-277 ## The root mean square of the residuals is 0.09 ## The df corrected root mean square of the residuals is 0.09 ## ## RMSEA index = 0.099 and the 10 % confidence intervals are 0.094 0.102 ## BIC = 214.83 ## ## Measures of factor score adequacy ## g F1* F2* F3* ## Correlation of scores with factors 0.97 0.08 0.83 0.87 ## Multiple R square of scores with factors 0.93 0.01 0.69 0.76 ## Minimum correlation of factor score estimates 0.86 -0.99 0.38 0.53 ## ## Total, General and Subset omega for each subset ## g F1* F2* F3* ## Omega total for total scores and subscales 0.94 0.74 0.94 0.51 ## Omega general for total scores and subscales 0.86 0.74 0.82 0.43 ## Omega group for total scores and subscales 0.07 0.00 0.13 0.08Exercise 10
Reduce the number of dimensions using hierarchical clustering analysis.
#################### # # # Exercise 10 # # # #################### iclust(data, title="Clastering solution")
## ICLUST (Item Cluster Analysis) ## Call: iclust(r.mat = data, title = "Clastering solution") ## ## Purified Alpha: ## C21 C24 ## 0.95 0.60 ## ## G6* reliability: ## C21 C24 ## 1 1 ## ## Original Beta: ## C21 C24 ## 0.84 0.36 ## ## Cluster size: ## C21 C24 ## 20 6 ## ## Item by Cluster Structure matrix: ## O P C21 C24 ## V1 C21 C21 0.69 -0.01 ## V2 C21 C21 0.69 -0.01 ## V3 C21 C21 0.67 -0.13 ## V4 C21 C21 0.68 0.02 ## V5 C21 C21 0.68 -0.02 ## V6 C21 C21 0.75 -0.01 ## V7 C21 C21 0.75 0.03 ## V8 C21 C21 0.78 0.03 ## V9 C21 C21 0.80 0.01 ## V10 C21 C21 0.81 0.05 ## V11 C21 C21 0.62 -0.03 ## V12 C21 C21 0.67 0.08 ## V13 C21 C21 0.80 0.01 ## V14 C21 C21 0.69 -0.07 ## V15 C21 C21 0.65 0.04 ## V16 C21 C21 0.72 0.08 ## V17 C21 C21 0.72 0.01 ## V18 C21 C21 0.65 -0.05 ## V19 C21 C21 0.61 -0.05 ## V20 C21 C21 0.66 -0.03 ## V21 C24 C24 0.03 0.36 ## V22 C24 C24 0.02 0.31 ## V23 C24 C24 -0.02 0.72 ## V24 C24 C24 -0.02 0.57 ## V25 C24 C24 0.00 0.27 ## V26 C24 C24 -0.03 0.51 ## ## With eigenvalues of: ## C21 C24 ## 10.0 1.4 ## ## Purified scale intercorrelations ## reliabilities on diagonal ## correlations corrected for attenuation above diagonal: ## C21 C24 ## C21 0.95 -0.01 ## C24 0.00 0.60 ## ## Cluster fit = 0.91 Pattern fit = 0.99 RMSR = 0.05